Skip to content

ggml-vulkan: disable transfer queue on UMA#20441

Open
RipleyTom wants to merge 1 commit intoggml-org:masterfrom
RipleyTom:fix_tq_uma
Open

ggml-vulkan: disable transfer queue on UMA#20441
RipleyTom wants to merge 1 commit intoggml-org:masterfrom
RipleyTom:fix_tq_uma

Conversation

@RipleyTom
Copy link

Fixes #20439 .

I am unsure why transfer queues add such a huge overhead(it was at least 10GB as tried reserving that much when loading the model and it still choked)

Summary from big C:

Using the separate SDMA/transfer queue for async copies is counterproductive here because:

  1. There's no separate device memory to transfer to/from — it's all unified memory
  2. The SDMA engine and its associated driver structures (kernel buffer objects for command streams, page table entries, cross-queue synchronization state) consume memory from the same pool
  3. The timeline semaphore synchronization between compute and transfer queues adds driver overhead with no benefit — the compute queue can issue buffer copies just as efficiently on UMA
  4. The transfer queue's command pool (transfer_cmd_pool) and its command buffers accumulate during model loading alongside compute_cmd_pool, effectively doubling the command infrastructure

@RipleyTom RipleyTom requested a review from 0cc4m as a code owner March 12, 2026 05:31
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Regression when trying to load a big model

1 participant